Generating Vector Code for Matrix-matrix Multiplication
نویسندگان
چکیده
The current state of the art Matrix-Matrix-Multiplication (MMM) kernel is known as ATLAS, which generates the best performing MMM code by search. However, today’s computer architecture changes rapidly and it is hard to generate a high performance code without knowing how to use the new instruction sets. Since ATLAS does not make use of blocking for L2 cache, or SSE/SSE2 instruction, we are encouraged to improve ATLAS to obtain higher MMM performance than that of the original ATLAS. Our experiment result shows that we can obtain high performance using SSE/SSE2 which is available on the new generations of Pentium.
منابع مشابه
Optimization by Run-time Specialization for Sparse Matrix-Vector Multiplication (Submitted for publication)
Run-time specialization is the process of generating programs based on information available only at run time. This technique has the potential of generating highly efficient codes, at the expense of the overheads of the run-time code generation. It is applicable when some input data is used repeatedly while other input data varies. In this paper we explore the potential for obtaining speedups ...
متن کاملOptimization of Sparse Matrix-Vector Multiplication by Specialization
Program specialization is the process of generating optimized programs based on available inputs. It is particularly applicable when some input data are used repeatedly while other input data vary. Specialization can be employed at compile-time as well as at run-time, depending on when the inputs become available. In this paper we explore the potential for obtaining speed-ups for sparse matrix-...
متن کاملA New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملDesign of Logic Network for Generating Sequency Ordered Hadamard Matrix H
A logic network to produce the sequency ordered Hadamard matrix H based on the property of gray code and orthogonal group codes is developed. The network uses a counter to generate Rademacher function such that the output of H will be in sequency. A general purpose shift register with output logic is used to establish a sequence of period P corresponding to a given value of order m of the Hadam...
متن کاملGenerating Optimized Sparse Matrix Vector Product over Finite Fields
Sparse Matrix Vector multiplication (SpMV) is one of the most important operation for exact sparse linear algebra. A lot of research has been done by the numerical community to provide efficient sparse matrix formats. However, when computing over finite fields, one need to deal with multi-precision values and more complex operations. In order to provide highly efficient SpMV kernel over finite ...
متن کامل